We will study a data set on the spread of Middle East Respiratory Syndrome Corona Virus (MERS-CoV) compiled and made available by Andrew Rambaut on Github
MERS-CoV is a positive-sense single-stranded Betacoronavirus. Its closest relatives are the SARS coronavirus, common-cold coronavirus, and other human betacoronaviruses. MERS-CoV first emerged in Saudi Arabia in 2012. It causes a severe respiratory illness. Transmission to humans may be direct (person-to-person), particularly in hospitals, or from contact with infected animals.
Exposure to camels is associated with many cases, although bats, particularly the Egyptian Tomb bat (Taphozous perforatus), are suspected to be the maintenance reservoir. The case fatality rate is around 40%.
Download data from: https://github.com/rambaut/MERS-Cases/blob/gh-pages/data/cases.csv Then load file “cases.csv”
mers <- read.csv('cases.csv')
library(lubridate)
library(ggplot2)
library(plotly)
Format the data and correct errors
1.For columns ‘onset2’ and ‘hospitalized2’, convert dates data into numbers into numbers**
mers$onset2 <- ymd(mers$onset)
mers$hospitalized2 <- ymd(mers$hospitalized)
2.Remove rows with missing data from column onset2 (na.omit) and find the minimum value of your data set (in this case is day0 of outbreak)
day0 <- min(na.omit(mers$onset2))
3.Translate dates into epidemic days (as.numeric)
mers$epi.day <- as.numeric(mers$onset2 - day0)
4.Run ggplot2 to get a bar plot with colors for different countries
ggplot(data=mers) +
geom_bar(mapping=aes(x=epi.day, fill=country)) +
labs(x='Epidemic day', y='Case count', title='Global count of MERS cases by date of symptom onset',
caption="Data from: https://github.com/rambaut/MERS-Cases/blob/gh-pages/data/cases.csv")
mers$infectious.period <- mers$hospitalized2-mers$onset2
mers$infectious.period <- as.numeric(mers$infectious.period, units = "days")
ggplot(data=mers) +
geom_histogram(aes(x=infectious.period)) +
labs(x='Infectious period', y='Frequency', title='Distribution of calculated MERS infectious period',
caption="Data from: https://github.com/rambaut/MERS-Cases/blob/gh-pages/data/cases.csv")
mers$infectious.period2 <- ifelse(mers$infectious.period<0,0,mers$infectious.period)
ggplot(data=mers) +
geom_histogram(aes(x=infectious.period2)) +
labs(x='Infectious period', y='Frequency',
title='Distribution of calculated MERS infectious period (positive values only)', caption="Data from: https://github.com/rambaut/MERS-Cases/blob/gh-pages/data/cases.csv")
ggplot(data=mers) +
geom_density(mapping=aes(x=infectious.period2)) +
labs(x='Infectious period', y='Frequency',
title='Probability density for MERS infectious period (positive values only)', caption="Data from: https://github.com/rambaut/MERS-Cases/blob/gh-pages/data/cases.csv")
8.Make an plot results on a interactive plot
interactive<-ggplot(data=mers) +
geom_area(stat='bin', mapping=aes(x=infectious.period2)) +
labs(x='Infectious period', y='Frequency',
title='Area plot for MERS infectious period (positive values only)',caption="Data from: https://github.com/rambaut/MERS-Cases/blob/gh-pages/data/cases.csv")
ggplotly(interactive)